Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.799
Filtrar
1.
J Acoust Soc Am ; 155(4): R7-R8, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38558083

RESUMEN

The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.


Asunto(s)
Percepción del Habla , Acústica , Acústica del Lenguaje , Cognición
2.
J Acoust Soc Am ; 155(4): 2698-2706, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38639561

RESUMEN

The notion of the "perceptual center" or the "P-center" has been put forward to account for the repeated finding that acoustic and perceived syllable onsets do not necessarily coincide, at least in the perception of simple monosyllables or disyllables. The magnitude of the discrepancy between acoustics and perception-the location of the P-center in the speech signal- has proven difficult to estimate, though acoustic models of the effect do exist. The present study asks if the P-center effect can be documented in natural connected speech of English and Japanese and examines if an acoustic model that defines the P-center as the moment of the fastest energy change in a syllabic amplitude envelope adequately reflects the P-center in the two languages. A sensorimotor synchronization paradigm was deployed to address the research questions. The results provide evidence for the existence of the P-center effect in speech of both languages while the acoustic P-center model is found to be less applicable to Japanese. Sensorimotor synchronization patterns further suggest that the P-center may reflect perceptual anticipation of a vowel onset.


Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Humanos , Fonética , Habla , Lenguaje
3.
PLoS One ; 19(4): e0301514, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38564597

RESUMEN

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.


Asunto(s)
Corteza Auditiva , Percepción del Habla , Humanos , Habla/fisiología , Percepción del Habla/fisiología , Acústica , Movimiento , Fonética , Acústica del Lenguaje
4.
J Acoust Soc Am ; 155(4): 2285-2301, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38557735

RESUMEN

Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.


Asunto(s)
Fonética , Acústica del Lenguaje , Estados Unidos , Acústica , Lenguaje , South Carolina
5.
Codas ; 36(3): e20230175, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38629682

RESUMEN

PURPOSE: To assess the influence of the listener experience, measurement scales and the type of speech task on the auditory-perceptual evaluation of the overall severity (OS) of voice deviation and the predominant type of voice (rough, breathy or strain). METHODS: 22 listeners, divided into four groups participated in the study: speech-language pathologist specialized in voice (SLP-V), SLP non specialized in voice (SLP-NV), graduate students with auditory-perceptual analysis training (GS-T), and graduate students without auditory-perceptual analysis training (GS-U). The subjects rated the OS of voice deviation and the predominant type of voice of 44 voices by visual analog scale (VAS) and the numerical scale (score "G" from GRBAS), corresponding to six speech tasks such as sustained vowel /a/ and /ɛ/, sentences, number counting, running speech, and all five previous tasks together. RESULTS: Sentences obtained the best interrater reliability in each group, using both VAS and GRBAS. SLP-NV group demonstrated the best interrater reliability in OS judgment in different speech tasks using VAS or GRBAS. Sustained vowel (/a/ and /ɛ/) and running speech obtained the best interrater reliability among the groups of listeners in judging the predominant vocal quality. GS-T group got the best result of interrater reliability in judging the predominant vocal quality. CONCLUSION: The time of experience in the auditory-perceptual judgment of the voice, the type of training to which they were submitted, and the type of speech task influence the reliability of the auditory-perceptual evaluation of vocal quality.


Asunto(s)
Disfonía , Percepción del Habla , Humanos , Habla , Reproducibilidad de los Resultados , Medición de la Producción del Habla , Variaciones Dependientes del Observador , Calidad de la Voz , Acústica del Lenguaje
6.
J Acoust Soc Am ; 155(4): 2612-2626, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38629882

RESUMEN

This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.


Asunto(s)
Fonética , Acústica del Lenguaje , Acústica
7.
J Acoust Soc Am ; 155(3): 2128-2138, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38498508

RESUMEN

A comprehensive examination of the acoustics of Contemporary Standard Bulgarian vowels is lacking to date, and this article aims to fill that gap. Six acoustic variables-the first three formant frequencies, duration, mean f0, and mean intensity-of 11 615 vowel tokens from 140 speakers were analysed using linear mixed models, multivariate analysis of variance, and linear discriminant analysis. The vowel system, which comprises six phonemes in stressed position, [ε a ɔ i ɤ u], was examined from four angles. First, vowels in pretonic syllables were compared to other unstressed vowels, and no spectral or durational differences were found, contrary to an oft-repeated claim that pretonic vowels reduce less. Second, comparisons of stressed and unstressed vowels revealed significant differences in all six variables for the non-high vowels [ε a ɔ]. No spectral or durational differences were found in [i ɤ u], which disproves another received view that high vowels are lowered when unstressed. Third, non-high vowels were compared with their high counterparts; the height contrast was completely neutralized in unstressed [a-ɤ] and [ɔ-u] while [ε-i] remained distinct. Last, the acoustic correlates of vowel contrasts were examined, and it was demonstrated that only F1, F2 frequencies and duration were systematically employed in differentiating vowel phonemes.


Asunto(s)
Fonética , Acústica del Lenguaje , Bulgaria , Acústica , Análisis Multivariante
8.
J Speech Lang Hear Res ; 67(4): 1090-1106, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38498664

RESUMEN

PURPOSE: This study examined speech changes induced by deep-brain stimulation (DBS) in speakers with Parkinson's disease (PD) using a set of auditory-perceptual and acoustic measures. METHOD: Speech recordings from nine speakers with PD and DBS were compared between DBS-On and DBS-Off conditions using auditory-perceptual and acoustic analyses. Auditory-perceptual ratings included voice quality, articulation precision, prosody, speech intelligibility, and listening effort obtained from 44 listeners. Acoustic measures were made for voicing proportion, second formant frequency slope, vowel dispersion, articulation rate, and range of fundamental frequency and intensity. RESULTS: No significant changes were found between DBS-On and DBS-Off for the five perceptual ratings. Four of six acoustic measures revealed significant differences between the two conditions. While articulation rate and acoustic vowel dispersion increased, voicing proportion and intensity range decreased from the DBS-Off to DBS-On condition. However, a visual examination of the data indicated that the statistical significance was mostly driven by a small number of participants, while the majority did not show a consistent pattern of such changes. CONCLUSIONS: Our data, in general, indicate no-to-minimal changes in speech production ensued from DBS stimulation. The findings are discussed with a focus on large interspeaker variability in PD in terms of their speech characteristics and the potential effects of DBS on speech.


Asunto(s)
Estimulación Encefálica Profunda , Enfermedad de Parkinson , Humanos , Acústica , Inteligibilidad del Habla/fisiología , Calidad de la Voz , Enfermedad de Parkinson/complicaciones , Enfermedad de Parkinson/terapia , Encéfalo , Acústica del Lenguaje
9.
J Acoust Soc Am ; 155(3): 1916-1927, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38456734

RESUMEN

Speech quality is one of the main foci of speech-related research, where it is frequently studied with speech intelligibility, another essential measurement. Band-level perceptual speech intelligibility, however, has been studied frequently, whereas speech quality has not been thoroughly analyzed. In this paper, a Multiple Stimuli With Hidden Reference and Anchor (MUSHRA) inspired approach was proposed to study the individual robustness of frequency bands to noise with perceptual speech quality as the measure. Speech signals were filtered into thirty-two frequency bands with compromising real-world noise employed at different signal-to-noise ratios. Robustness to noise indices of individual frequency bands was calculated based on the human-rated perceptual quality scores assigned to the reconstructed noisy speech signals. Trends in the results suggest the mid-frequency region appeared less robust to noise in terms of perceptual speech quality. These findings suggest future research aiming at improving speech quality should pay more attention to the mid-frequency region of the speech signals accordingly.


Asunto(s)
Percepción del Habla , Humanos , Enmascaramiento Perceptual , Ruido/efectos adversos , Inteligibilidad del Habla , Acústica del Lenguaje
10.
J Acoust Soc Am ; 155(2): 1253-1263, 2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38341748

RESUMEN

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.


Asunto(s)
Voz , Niño , Humanos , Acústica , Acústica del Lenguaje , Vibración , Espectrografía del Sonido
11.
J Acoust Soc Am ; 155(2): 1264-1271, 2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38345424

RESUMEN

The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice.


Asunto(s)
Percepción del Habla , Acústica del Lenguaje , Calidad de la Voz , Sonido , Audición
12.
J Acoust Soc Am ; 155(2): 1422-1436, 2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38364044

RESUMEN

Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.


Asunto(s)
Minorías Sexuales y de Género , Percepción del Habla , Humanos , Masculino , Femenino , Calidad de la Voz , Acústica del Lenguaje , Masculinidad
13.
Phonetica ; 81(2): 185-220, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38358292

RESUMEN

Research on various languages shows that dynamic approaches to vowel acoustics - in particular Vowel-Inherent Spectral Change (VISC) - can play a vital role in characterising and classifying monophthongal vowels compared with a static model. This study's aim was to investigate whether dynamic cues also allow for better description and classification of the Hijazi Arabic (HA) vowel system, a phonological system based on both temporal and spectral distinctions. Along with static and dynamic F1 and F2 patterns, we evaluated the extent to which vowel duration, F0, and F3 contribute to increased/decreased discriminability among vowels. Data were collected from 20 native HA speakers (10 females and 10 males) producing eight HA monophthongal vowels in a word list with varied consonantal contexts. Results showed that dynamic cues provide further insights regarding HA vowels that are not normally gleaned from static measures alone. Using discriminant analysis, the dynamic cues (particularly the seven-point model) had relatively higher classification rates, and vowel duration was found to play a significant role as an additional cue. Our results are in line with dynamic approaches and highlight the importance of looking beyond static cues and beyond the first two formants for further insights into the description and classification of vowel systems.


Asunto(s)
Fonética , Acústica del Lenguaje , Masculino , Femenino , Humanos , Lenguaje , Acústica , Señales (Psicología)
14.
J Neurophysiol ; 131(3): 480-491, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38323331

RESUMEN

The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception.NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding.


Asunto(s)
Percepción del Habla , Humanos , Habla , Acústica del Lenguaje , Acústica , Lenguaje
15.
Eur Arch Otorhinolaryngol ; 281(5): 2707-2716, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38319369

RESUMEN

PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants. METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests. RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP. CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.


Asunto(s)
Diabetes Mellitus Tipo 2 , Voz , Humanos , Calidad de la Voz , Acústica del Lenguaje , Diabetes Mellitus Tipo 2/complicaciones , Estudios Transversales , Medición de la Producción del Habla , Acústica
16.
J Speech Lang Hear Res ; 67(2): 400-414, 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38306498

RESUMEN

PURPOSE: According to most models of spoken word recognition, listeners probabilistically activate a set of lexical candidates, which is incrementally updated as the speech signal unfolds. Speech carries segmental (speech sound) as well as suprasegmental (prosodic) information. The role of the latter in spoken word recognition is less clear. We investigated how suprasegments (tone and voice quality) in three North Germanic language varieties affected lexical access by scrutinizing temporally fine-grained neurophysiological effects of lexical uncertainty and information gain. METHOD: Three event-related potential (ERP) studies were reanalyzed. In all varieties investigated, suprasegments are associated with specific word endings. Swedish has two lexical "word accents" realized as pitch falls with different timings across dialects. In Danish, the distinction is in voice quality. We combined pronunciation lexica and frequency lists to calculate estimates of lexical uncertainty about an unfolding word and information gain upon hearing a suprasegmental cue and the segment upon which it manifests. We used single-trial mixed-effects regression models run every 4 ms. RESULTS: Only lexical uncertainty showed solid results: a frontal effect at 150-400 ms after suprasegmental cue onset and a later posterior effect after 200 ms. While a model including only segmental information mostly performed better, it was outperformed by the suprasegmental model at 200-330 ms at frontal sites. CONCLUSIONS: The study points to suprasegmental cues contributing to lexical access over and beyond segments after around 200 ms in the North Germanic varieties investigated. Furthermore, the findings indicate that a previously reported "pre-activation negativity" predominantly reflects forward-looking processing. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25016486.


Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Humanos , Percepción del Habla/fisiología , Lenguaje , Encéfalo , Potenciales Evocados
17.
J Speech Lang Hear Res ; 67(3): 782-801, 2024 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-38354102

RESUMEN

PURPOSE: The current study investigated English prosodic focus marking by autistic and typically developing (TD) Cantonese trilingual children, and examined the potential differences in this regard compared to native English-speaking children. METHOD: Forty-eight participants were recruited with 16 speakers for each of the three groups (Cantonese-speaking autistic [CASD], Cantonese-speaking TD [CTD], and English-speaking TD [ETD] children), and prompt questions were designed to elicit desired focus type (i.e., broad, narrow, and contrastive focus). Mean duration, mean fundamental frequency (F0), F0 range, mean intensity, and F0 curves were used as the acoustic correlates for linear mixed-effects model fitting and functional data analyses in relation to groups and focus conditions (i.e., broad, narrow, and contrastive pre-, on-, and post-focus). RESULTS: The CTD group had post-focus compression (PFC) patterns via reducing mean duration, narrowing F0 range, and lowering mean F0, F0 curve, and mean intensity for words under both narrow and contrastive post-focus conditions, while the CASD group only had shortened mean duration and lowered F0 curves. However, neither the CTD group nor CASD group showed much of on-focus expansion (OFE) patterns. The ETD group marked OFE by increasing mean duration, mean F0, mean intensity, and higher F0 curve for words under on-focus conditions. CONCLUSIONS: The CTD group utilized more acoustic cues than the CASD group when it comes to PFC. The ETD group differed from the CASD and CTD groups in the use of OFE. Furthermore, both the CASD and CTD groups showed positive first language transfer in the use of duration and intensity and, potentially, successful acquisition in the use of F0 for prosodic focus marking. Meanwhile, the differences in the use of OFE between the Cantonese-speaking and English-speaking groups, not PFC, might indicate that Cantonese-speaking children acquire PFC prior to OFE.


Asunto(s)
Trastorno del Espectro Autista , Niño , Humanos , Acústica del Lenguaje , Medición de la Producción del Habla , Lenguaje , Acústica
18.
J Acoust Soc Am ; 155(1): 294-305, 2024 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-38230970

RESUMEN

This study constitutes an investigation into the acoustic variability of intervocalic alveolar taps in a corpus of spontaneous speech from Madrid, Spain. Substantial variability was documented in this segment, with highly reduced variants constituting roughly half of all tokens during spectrographic inspection. In addition to qualitative documentation, the intensity difference between the tap and surrounding vowels was measured. Changes in this intensity difference were statistically modeled using Bayesian finite mixture models containing lexical and phonetic predictors. Model comparisons indicate predictive performance is improved when we assume two latent categories, interpreted as two pronunciation variants for the Spanish tap. In interpreting the model, predictors were more often related to categorical changes in which pronunciation variant was produced than to gradient intensity changes within each tap type. Variability in tap production was found according to lexical frequency, speech rate, and phonetic environment. These results underscore the importance of evaluating model fit to the data as well as what researchers modeling phonetic variability can gain in moving past linear models when they do not adequately fit the observed data.


Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Teorema de Bayes , Habla , Fonética , Acústica
19.
J Acoust Soc Am ; 155(1): 381-395, 2024 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-38240668

RESUMEN

Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score y∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.


Asunto(s)
Disfonía , Ronquera , Humanos , Ronquera/diagnóstico , Reproducibilidad de los Resultados , Calidad de la Voz , Fonación , Acústica , Acústica del Lenguaje , Medición de la Producción del Habla
20.
J Speech Lang Hear Res ; 67(2): 384-399, 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38289853

RESUMEN

PURPOSE: The purpose of this study was to quantify sentence-level articulatory kinematics in individuals treated for oral squamous cell carcinoma (ITOC) compared to control speakers while also assessing the effect of treatment site (jaw vs. tongue). Furthermore, this study aimed to assess the relation between articulatory-kinematic measures and self-reported speech problems. METHOD: Articulatory-kinematic data from the tongue tip, tongue back, and jaw were collected using electromagnetic articulography in nine Dutch ITOC and eight control speakers. To quantify articulatory kinematics, the two-dimensional articulatory working space (AWS; in mm2), one-dimensional anteroposterior range of motion (AP-ROM; in mm), and superior-inferior range of motion (SI-ROM in mm) were calculated and examined. Self-reported speech problems were assessed with the Speech Handicap Index (SHI). RESULTS: Compared to a sex-matched control group, ITOC showed significantly smaller AWS, AP-ROM, and SI-ROM for both the tongue tip and tongue back sensor, but no significant differences were observed for the jaw sensor. This pattern was found for both individuals treated for tongue and jaw tumors. Moderate nonsignificant correlations were found between the SHI and the AWS of the tongue back and jaw sensors. CONCLUSIONS: Despite large individual variation, ITOC showed reduced one- and two-dimensional tongue, but not jaw, movements compared to control speakers and treatment for tongue and jaw tumors resulted in smaller tongue movements. A larger sample size is needed to establish a more generalizable connection between the AWS and the SHI. Further research should explore how these kinematic changes in ITOC are related to acoustic and perceptual measures of speech.


Asunto(s)
Carcinoma de Células Escamosas , Neoplasias Maxilomandibulares , Neoplasias de la Boca , Humanos , Inteligibilidad del Habla , Medición de la Producción del Habla/métodos , Neoplasias de la Boca/cirugía , Acústica del Lenguaje , Habla , Lengua/cirugía , Fenómenos Biomecánicos , Fenómenos Electromagnéticos , Maxilares
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...